Expert Systems with Applications
○ Elsevier BV
Preprints posted in the last 90 days, ranked by how well they match Expert Systems with Applications's content profile, based on 11 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Usuzaki, T.; Matsunbo, E.; Inamori, R.
Show abstract
Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.
Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.
Show abstract
Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.
Chen, Y.; Gui, T.; Huang, Z.; Quach, N.; Tu, S.; Liu, J.; Garrett, T. J.; Starkweather, A. R.; Lyon, D. E.; Shepherd, B. E.; Tu, X. M.; Lin, T.
Show abstract
SO_SCPLOWUMMARYC_SCPLOWChemotherapy in breast cancer (BC) can substantially affect mental wellness. Advances in metabolomics enable comprehensive profiling of metabolic changes over time during and after treatment, offering insights into biological mechanisms linking chemotherapy to mental health outcomes. To study the association between metabolite profiles and mental wellness, correlation-based analyses are particularly useful. Spearmans rho is a widely used correlation measure and popular alternative to Pearsons correlation, since it also applies to non-linear association between variables. However, existing methods are not designed for longitudinal data and do not allow for covariate adjustments. In this paper, we propose a novel regression-based framework grounded in a class of semiparametric models, the functional response models, to extend this popular correlation measure to longitudinal settings with missing data under the missing at random assumption. This framework facilitates inferences about temporal changes in correlations over time and association of explanatory variables for such changes. We use simulation studies to evaluate performance of the approach with moderate sample sizes. We apply the approach to a one-year longitudinal substudy of the EPIGEN study to examine the longitudinal association between metabolite profiles and mental wellness in BC patients undergoing chemotherapy. The identified metabolites may serve as candidates for future in-depth bioinformatics analyses and translational investigations.
Cao, X.; Wei, X.; Hou, J.; cai, c.; Wang, Q.
Show abstract
We present a digital twin framework for real-time glucose monitoring and forecasting in septic patients in intensive care units (ICUs). The framework combines advanced machine learning models trained on continuous glucose measurements with a dynamic transfer-learning workflow that enables rapid deployment to individual patients and supports personalized, adaptive, and predictive clinical decision-making. Built on a foundation model--a pretrained time-series transformer--the digital twin continuously updates its parameters as new patient data arrive and produces rolling near-term forecasts in real time. To assess adaptability and computational efficiency, we deployed the pretrained model to ten septic patients and evaluated multiple retraining strategies, including zero-shot inference, linear probing, and full and staged fine-tuning. Results show that the model can be initialized and personalized for a new patient within seconds on a standard laptop while achieving accurate glucose forecasts under varying data conditions. These findings demonstrate the feasibility of real-time model personalization in resource-constrained, high-acuity environments and highlight the potential of digital twins as scalable, AI-enabled platforms for continuous physiological monitoring, clinical decision support, and individualized treatment design in the ICU.
Steinmetz, P.; Frouin, F.; Morard, V.; Buvat, I.
Show abstract
Medical images (MI) exhibit variability due to different acquisition protocols, devices, and patient populations, making failure detection at inference time essential for reliable deployment of clinical classifiers. As existing evaluations of failure detection methods use different settings, it is difficult to compare results and identify the best strategy, if any. We present a comprehensive benchmark of eight confidence scoring functions and two score-aggregation strategies across eight MI tasks spanning diverse modalities, backbone architectures, training setups, and failure sources. The confidence ranking ability and classification error mitigation are jointly evaluated. While no single method systematically dominated across settings, aggregation of confidence scores consistently matched or approached the best individual method and substantially reduced silent failure rate. The failure detection performance was strongly correlated with classifier accuracy for all tested settings. These findings provide large-scale evidence regarding the strengths and limitations of confidence scoring strategies and offer actionable guidance for mitigating silent failures under realistic distribution shifts in MI.
Sivakumar, E.; Anand, A.
Show abstract
Computer vision and deep learning techniques, including convolutional neural networks (CNNs) and transformers, have increased the performance of medical image classification systems. However, training deep learning models using medical images is a challenging task that necessitates a substantial amount of annotated data. In this paper, we implement data augmentation strategies to tackle dataset imbalance in the VinDr-SpineXR dataset, which has a lower number of spine abnormality X-ray images compared to normal spine X-ray images. Geometric transformations and synthetic image generation using Generative Adversarial Networks are explored and applied to the abnormal classes of the dataset, and classifier performance is validated using VGG-16 and InceptionNet to identify the most effective augmentation technique. Additionally, we introduce a hybrid augmentation technique that addresses class imbalance, reduces computational overhead relative to a GAN-only approach, and achieves [~]99% validation accuracy with both classifiers across all three case studies.
Brann, E.; Polle, R.; Cepukaityte, G.; Georgescu, A. L.; Parsons, O.; Molimpakis, E.; Goria, S.
Show abstract
Accessible screening for type 2 diabetes (T2D) is critical, with millions of cases remaining undiagnosed globally. Here, we present the largest known real-world validation study for a speech-based T2D prediction model, trained on speech data from over 21,000 individuals, that works on features extracted from 20-second speech recordings. The model was evaluated in two stages: 1) Against self-reported diagnoses in 7,319 English-speaking participants using AUC, and 2) Against HbA1c blood tests in a subset of 801 participants drawn from the full cohort. Performance was also compared against QDiabetes and in the presence of key confounding variables. The model demonstrated clinically useful predictive capacity on self-reported data (AUC = 0.80 {+/-} 0.03), approaching QDiabetes (AUC = 0.86 {+/-} 0.03). It was robust to most demographic confounds (e.g., age and sex) and medication use, with reduced performance in the presence of comorbidities (e.g., cardiovascular disease and hypertension). At diabetes threshold of HbA1c [≥]48 mmol/mol, the model achieved an AUC of 0.75 ({+/-}0.07). This biomarker-validated speech-based tool demonstrates potential to complement existing methods through accessible, scalable screening requiring only a 20-second speech sample.
Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.
Show abstract
Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.
Alsaiari, A.; Turki, T.; Taguchi, Y.-h.
Show abstract
Ovarian cancer is one of the gynecological cancer types, which, if metastasized and not detected early, can cause deaths among women. Therefore, there is a need to accurately predict drug responses to ovarian cancer. A gynecological pathologist inspects abnormality in tissues, followed by providing a report about patients; however, such a diagnostic process is (1) hard; (2) requires experience; and (3) time consuming. Moreover, existing tools are far from perfect. Hence, we present a computational pipeline to improve predicting drug response pertaining to ovarian cancer, derived as follows. First, we download digital pathology images pertaining to ovarian bevacizumab response from the cancer imaging archive repository. We employed histogram of oriented gradients to images, constructing feature vectors, provided to Fisher linear discriminant analysis to change the representation through dimensionality reduction. Then, we provide reduced-dimensionality data for regression analysis through support vector regression coupled with various kernels and calculating the area under the ROC curve (AUC). Experimental results against transformer-based models (ViT and Swin) and other deep learning (DL) models (VGG16, ResNet50, InceptionV3, MobileNetV2, and EfficientNetB6) demonstrate that our approach with radial kernel (named SVRD+R) yielded an AUC performance improvements of 17% against the best-performing transformer-based model (ViT) while obtaining an AUC performance improvements of 14.9% when compared against the best DL-based model (MobileNetV2). These results demonstrate the superiority and feasibility of our AI-based pipeline when tackling prediction problems pertaining to gynecologic cancer studies. MSC92B05; 68T09
Lin, G.; Miao, R.; Sacheck, J.; Zhang, X.
Show abstract
Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.
Sparnon, E.; Stevens, K.; Song, E.; Harris, R. J.; Strong, B. W.; Bruno, M. A.; Baird, G. L.
Show abstract
The present study evaluates the real-world clinical predictive performance of FDA-authorized artificial intelligence (AI) devices used in radiology, focusing on the false positive paradox (FPP) and its implications for clinical practice. To do this, we analyzed publicly available FDA data on AI radiology devices from 2024 and 2025 from 510(k) summaries, demonstrating how diagnostic accuracy metrics like sensitivity and specificity do not necessarily translate into high positive predictive value (PPV) due to the influence of target disease prevalence. We show the importance of disclosing the false discovery (FDR) and false omission rates (FOR) and argue that this transparency enables clinicians to select AI systems that balance false positive and false negative costs in a clinically, ethically, and financially appropriate manner. Finally, we provide recommendations for what data should be provided to best serve practices and radiologists.
Ray, P.
Show abstract
Thyroid carcinoma is one of the most prevalent endocrine malignancies worldwide, and accurate preoperative differentiation between benign and malignant thyroid nodules remains clinically challenging. Diagnostic methods that medical practitioners use at present depend on their personal judgment to evaluate both imaging results and separate clinical tests, which creates inconsistency that leads to incorrect medical evaluations. The combination of radiological imaging with clinical information systems enables healthcare providers to enhance their capacity to make reliable predictions about patient outcomes while improving their decision-making abilities. The study introduces a deep learning framework that utilizes multiple data sources by combining magnetic resonance imaging (MRI) data with clinical text to predict thyroid cancer. The system uses a Vision Transformer (ViT) to obtain advanced MRI scan features, while a domain-adapted language model processes clinical documents that contain patient medical history and symptoms and laboratory results. The cross-modal attention system enables the system to merge imaging data with textual information from different sources, which helps to identify how the two types of data are interconnected. The system uses a classification layer to classify the fused features, which allows it to determine the probability of cancerous tumors. The experimental results show that the proposed multimodal system achieves better results than the unimodal base systems because it has higher accuracy, sensitivity, specificity, and AUC values, which help medical personnel to make better preoperative decisions.
Srinivasan, A.; Sritharan, D. V.; Chadha, S.; Fu, D.; Hossain, J. O.; Breuer, G. A.; Aneja, S.
Show abstract
PurposeDeep learning models are increasingly being used in medical diagnostics, but their vulnerability to adversarial perturbations raises concerns about their reliability in clinical applications. Capsule networks (CapsNets) are a promising architecture for medical imaging tasks, given their ability to model spatial relationships and train with smaller amounts of data. Although previous studies have focused on adversarial training approaches to improve robustness, exploring alternative architectures is an underexplored direction for combating poor adversarial stability. Prior work has suggested that CapsNets may exhibit improved robustness to adversarial perturbations compared to convolutional neural networks (CNNs), but performance on adversarial images has not been studied systematically in clinical environments. We evaluated the robustness of CapsNets compared to CNNs and vision transformers (ViTs) across multiple medical image classification tasks. MethodsWe trained two CNNs (ResNet-18 and ResNet-50), one ViT (MedViT), and two CapsNets (DR-CapsNet and BP-CapsNet) on four distinct medical imaging datasets (PneumoniaMNIST, BreastMNIST, NoduleMNIST3D, and BloodMNIST) and one natural image dataset (MNIST). Models were evaluated on adversarial examples generated by projected gradient descent and fast gradient sign method across a range of perturbation bounds. Interpretability experiments, including latent space and Gradient-weighted Class Activation Mapping (Grad-CAM) analyses, were conducted to better understand model stability on adversarial inputs. ResultsCapsNets demonstrated superior robustness under adversarial perturbations compared to CNNs and ViTs across all medical imaging datasets and the natural image dataset. Latent space and Grad-CAM visualizations revealed that CapsNets maintained more consistent embedding representations and attention maps after adversarial perturbations compared to CNNs and ViTs, suggesting that advantages in CapsNet robustness are supported, at least in part, by more stable feature encodings. Bayes-Pearson routing further improved robustness over standard dynamic routing in CapsNets without compromising baseline performance, suggesting a potential architectural improvement. ConclusionCapsNets exhibit intrinsic advantages in adversarial robustness over CNN- and ViT-based models on medical imaging tasks, suggesting they are a reliable alternative for medical image classification. These findings support the use of CapsNets in clinical applications where model reliability is critical.
Wan, S.-Y.; Chen, W.-Y.
Show abstract
Accurate segmentation of nasal and paranasal sinus structures from CT scans is critical for surgical planning and treatment evaluation in rhinology. However, the complex anatomical topology and thin-wall boundaries of these structures pose significant challenges for automated segmentation methods. We propose AFS-DSN (Adaptive Frequency-Spatial Dual-Stream Network), a novel deep learning architecture that integrates multi-scale wavelet decomposition with spatial feature learning for binary segmentation of the nasal cavity complex. Our method employs a dual-stream encoder with frequency branch utilizing three wavelet scales (db1, db2, db4) to capture 24 frequency sub-bands, enabling enhanced boundary detection in anatomically challenging regions. Cross-domain attention and adaptive routing mechanisms dynamically fuse spatial and frequency features based on local tissue characteristics. We formulate the task as binary segmentation where all five anatomical structures (maxillary sinus, sphenoid sinus, ethmoid sinus, frontal sinus, and nasal cavity) are treated as a unified foreground region against the background, prioritizing clinical boundary detection over individual structure differentiation. Evaluated on the NasalSeg dataset (130 CT volumes) with a 70/15/15 train/validation/test split, AFS-DSN achieves 94.34% {+/-} 2.30% overall Dice coefficient with statistically significant improvements in thin-wall regions (91.34% vs. 90.57% baseline, p=0.004) and statistically significant improvement in Surface Dice at 1mm tolerance (0.874 vs. 0.868 baseline, p=0.010), demonstrating enhanced boundary precision while maintaining sub-second inference time, making the method suitable for surgical planning applications where sub-millimeter accuracy is clinically relevant. To address concerns regarding model complexity, we further introduce AFS-DSN-Lite, a parameter-efficient variant (27.41M parameters) that achieves comparable performance (94.37% Dice) through depthwise separable convolutions, and validate robustness via 3-fold cross-validation (mean Dice: 94.59% {+/-} 0.31%).
Viguerie, A.; Iacomini, E.; D'Orsogna, M. R.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWAlcohol-associated liver disease (ALD) has been steadily increasing in the United States for many years, as attested by increases in ALD deaths and liver transplant demand. Direct measurement of ALD incidence is challenging as diagnosis often occurs late (or not at all). This study employs a demographically-aware backcalculation method, based on mortality data, to reconstruct latent, age-structured ALD risk and incidence trends in the US population from 2008 to 2022 and uses this information to forecast future ALD trends through 2030. We find that ALD incidence has risen steadily since 2008, with a sharp increase during the 2020 COVID-19 pandemic, and that the average age at onset has also increased over time, with demographic factors playing a substantial role. While our forecasts suggest a continuation of the pre-2020 growth in ALD incidence for most age and sex groups, we also predict marked increases among younger men, a generational shift toward older age cohorts, and substantial rises among older females. Most concerning, between 2022 and 2030, incidence is expected to double among younger men and older females and by 2030 the number of new male ALD cases is projected to be more than twice that of females for all age groups. Our results provide a clearer understanding of evolving ALD trends, highlighting the role of demographic and birth cohort effects. We underscore the urgent need for targeted interventions, particularly among younger men, to reduce ALD-related behaviors and future burden.
Zhang, R.
Show abstract
AimsThe oral glucose tolerance test (OGTT) is effective for detecting post-load dysglycemia, but it is burdensome and therefore not routinely used. Continuous glucose monitoring (CGM) offers a convenient way to capture real-world glucose patterns, yet it remains unclear whether CGM-derived metrics reflect OGTT-defined dysglycemia. We therefore aimed to evaluate CGM-derived and clinical metrics for predicting OGTT 2-hour glucose, classifying OGTT-defined dysglycemia, and assessing day-to-day repeatability. MethodsWe analyzed a cohort with paired free-living CGM and OGTT. Multiple CGM-derived metrics and clinical measures were compared for prediction of OGTT 2-hour glucose, classification of OGTT-defined dysglycemia, and day-to-day stability. Predictive performance was assessed primarily by leave-one-out (LOO) R2, and day-to-day repeatability by intraclass correlation coefficients (ICC). ResultsThe glycemic persistence index (GPI), a metric integrating the magnitude and duration of glycemic elevation, was the strongest single predictor of OGTT 2-hour glucose (LOO R2 0.439). GPI also showed strong day-to-day repeatability (ICC 0.665) and ranked first on a combined prediction-stability score. For classification of OGTT-defined dysglycemia, HbA1c had a slightly higher AUC than GPI, but GPI plus HbA1c performed best overall, indicating complementary information. ConclusionsGPI was a strong predictor of OGTT 2-hour glucose and showed a favorable balance between predictive performance and day-to-day stability, supporting its potential utility as a CGM-derived marker of dysglycemia.
Agumba, J.; Erick, S.; Pembere, A.; Nyongesa, J.
Show abstract
Abstract Objectives: To develop and evaluate a deployable deep learning system with Gradient-weighted Class Activation Mapping (Grad-CAM) for tuberculosis screening from chest radiographs and to assess its classification performance and explainability across desktop and mobile deployment platforms. Materials and methods: This study used publicly available chest X-ray datasets containing Normal and Tuberculosis images. A DenseNet121-based transfer learning model was trained using stratified training, validation, and test splits with data augmentation and class weighting. Model performance was evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Grad-CAM was used to visualize regions influencing model predictions. The trained model was converted to TensorFlow Lite and deployed in both a Windows desktop application and a Flutter-based mobile application for offline inference and visualization. Results: The model demonstrated strong classification performance on the independent test dataset, with high accuracy and AUC values indicating effective discrimination between Normal and Tuberculosis cases. Grad-CAM visualizations showed that the model focused primarily on anatomically relevant lung regions, particularly the upper and mid-lung fields in Tuberculosis cases. Deployment testing confirmed consistent prediction outputs and Grad-CAM visualizations across both Windows and mobile platforms. Conclusion: The proposed deployable deep learning system with Grad-CAM provides accurate and interpretable tuberculosis screening from chest radiographs and demonstrates feasibility for offline mobile and desktop deployment. This approach has potential as an artificial intelligence-assisted screening and decision support tool in radiology, particularly in resource-limited and remote healthcare settings.
Enywaku, A.; Asiku, R. A.
Show abstract
Severe fetal growth restriction (sFGR) affects 5 to 10% of pregnancies worldwide and is a major contributor to perinatal morbidity and mortality, particularly in low- and middle-income countries (LMICs). Traditional 2D ultrasound detection methods suffer from operator dependency, gestational age uncertainty, and limited access to Doppler in many low-resource facilities. This study presents a deep learning framework for sFGR screening and triage using 2D fetal abdominal ultrasound images designed to operate independently of precise gestational dating. Growth restriction severity labels were derived by mapping abdominal circumference measurements to INTERGROWTH-21st term percentiles as a gestational-age-normalized proxy for fetal size restriction when case-level gestational age or birth-weight data are unavailable. A systematic literature review of 37 studies revealed gaps in severity stratification and generalizability. We implemented a DenseNet-121-based model with abdominal circumference measurement for severity-aware classification using a retrospective single-center dataset of 1588 annotated fetal abdominal images from 169 term pregnancies. Patient-wise 3-fold cross-validation and ensemble testing yielded 93.7% accuracy, a weighted F1-score of 0.76, and ROC AUC [≥] 0.98 per class on heldout data. The approach outperforms previously reported single-center methods on this dataset while explicitly targeting LMIC-specific constraints. It demonstrates potential as a gestational-age-independent first-line triage layer for equitable prenatal screening, subject to prospective multi-site validation.
Tegenaw, G. S.; Degu, M. Z.; Gebeyehu, W. B.; Senay, A. B.; Krishnamoorthy, J.; Ward, T.; Simegn, G. L.
Show abstract
Effective public health planning and intervention strategies necessitate an understanding of the temporal and geographic distribution of disease incidences. This requires robust frameworks for disease incidence forecasting. However, due to variations in cases and temporal dynamics, grasping the distinct patterns of climate-sensitive diseases poses significant challenges, including identifying hotspots, trends, and seasonal variations in disease incidence. Furthermore, although most studies focus on directly predicting future incidence using historical patterns and covariates, a significant gap remains between methodological proliferation marked by diverse architectures, where models are trained and validated on benchmark datasets that are standardized and statistically stable, and epidemiological reality, which is often characterized by irregular, sparse, and highly skewed data, as well as rare but high-magnitude or bimodally distributed incidences. Hence, traditional end-to-end approaches that directly map climate and disease data often fail in these data-scarce settings due to overfitting and poor generalization. To understand disease epidemiology and mitigate the impact of incidence, we analyzed a decade of retrospective datasets in Ethiopia to examine how climate and weather conditions influence the incidence or spread of climate-sensitive diseases, including malaria and dysentery. In this study, we proposed a two-stage hybrid framework, a climate-informed disease prediction model, to forecast the likelihood of disease incidences using decades of climate and weather data. First, deep learning was applied to capture latent weather dynamics. Then, a hurdle model using Extreme Gradient Boosting (XGB) was designed for zero-inflated incidence data, combining XGBClassifier to predict incidence and XGBRegressor to estimate its size, based on weather dynamics to forecast disease incidence. Our proposed multivariate climate-driven disease incidence model incorporates both spatial (elevation, coordinates) and temporal (year, month) factors, along with key weather parameters (precipitation, sunlight, wind, relative humidity, temperature) to predict the likelihood of multiple diseases occurring in each area, serving as a foundation for future disease incidence predictions in the region. Out of 72 evaluated experiments across four categories and six targets, we found that the Transformer model showed highest number of statistically significant wins (n=18, 25.0%) comparison with Long Short-Term Memory (LSTM) (n=9, 12.5%) and the Temporal Convolutional Neural Network (TCN) (n=5, 6.9%) at climate variable forecasting using Pairwise Model Comparison Diebold-Mariano Test. The hurdle model that combines XGBClassifier and XGBRegressor outperformed the baseline in both Malaria and Dysentery forecasting. Error stratification revealed that the hurdle model provided the greatest benefit during incidence periods, as indicated by a substantially lower Mean Average Error (MAE) in both incidence and non-incidence periods than the baseline. Our proposed modular pipeline first forecasts climate variables, then predicts disease incidence, thereby enhancing interpretability and generalization in data-sparse settings. Overall, this approach provides a scalable, climate-aware forecasting tool for public health planning, particularly in regions where these diseases are endemic or where climate change may affect their prevalence, as well as in data-scarce settings.
Mayala, S.; Mzurikwao, D.; Suluba, E.
Show abstract
Deep learning model classification on large datasets is often limited in countries with restricted computational resources. While transfer learning can offset these limitations, standard architectures often maintain a high memory footprint. This study introduces HybridNet-XR, a memory-efficient and computationally lightweight hybrid convolutional neural network (CNN) designed to bridge the domain gap in medical radiography using autonomous self-supervised learning protocols. The HybridNet-XR architecture integrates depthwise separable convolutions for parameter reduction, residual connections for gradient stability, and aggressive early downsampling to minimize the video RAM (VRAM) footprint. We evaluated several training paradigms, including teacher-free self-supervised learning (SSL-SimCLR), teacher-led knowledge distillation (KD), and domain-gap (DG) adaptation. Each variant was pre-trained on ImageNet-1k subsets and fine-tuned on the ChestX6 multi-class dataset. Model interpretability was validated through gradient-weighted class activation mapping (Grad-CAM). The performance frontier analysis identified the HybridNet-XR-150-PW (Pre-warmed) as the optimal configuration, achieving a 93.38% average accuracy and 99% AUC while utilizing only 814.80 MB of VRAM. Regarding class-wise accuracy, this variant significantly outperformed standard MobileNetV2 and teacher-led models in critical diagnostic categories, notably Covid-19 (97.98%) and Emphysema (96.80%). Grad-CAM visualizations confirmed that the teacher-free pre-warming phase allows the model to develop sharper, anatomically grounded focus on pathological landmarks compared to distilled models. Specialized pre-warming schedules offer a viable, computationally autonomous alternative to knowledge distillation for medical imaging. By eliminating the requirement for high-performance teacher models, HybridNet-XR provides a robust and trustworthy diagnostic foundation suitable for clinical deployment in resource-constrained environments. Author summaryTraditional deep learning models for medical imaging are often too large for the low-power computers available in many global health settings. We developed a new model to bridge this computational gap. We designed HybridNet-XR, a highly efficient AI architecture, and trained it using a "teacher-free" method that doesnt require a massive supercomputer. We found a specific version (H-XR150-PW) that provides high accuracy while using very little memory. Our results show that high-performance diagnostic AI can be deployed on standard, low-cost hardware. Furthermore, using visual heatmaps (Grad-CAM), we proved that the AI correctly identifies medical landmarks like lung opacities, ensuring it is safe and reliable for real-world clinical use.